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Abstract 

For a given statistical model, it often happens that it is necessary to intervene the model 
to reduce the variances of the output variables. In structural equation models, this can be 
done by changing the values of the path coefficients by intervention. First, we explain that the 
expectations and variance matrix can be decomposed into several parts in terms of the total 
effects. Then, we show that an algorithm to obtain intervention method which minimizes the 
weighted sum of the variances can be formulated as a convex quadratic programming. This 
formulation allows us to impose boundary conditions for the intervention, so that we can find 
the practical solutions. We also treat a problem to adjust the expectations on targets. 

Key words: Convex quadratic programming; Structural equation models; Total effects. 

1 Introduction 



The method s of struct u ral eq uatio n models (SEMs) developed by geneticists (Wright ( 19231 )) and 



economists ( Haavelmo ( 19431 ) and Koopmana ( 19491 )) are widely used as analytical tools in a lot 



of fields including genetics, econometrics, social sciences and statistical quality control. To meet 
the demands of the practical researchers, thousands of studies on parameter estimation and model 
fitting for structural equation models have been made. 

However, structural equation models are more than tools for analysis. We can use structur al 
equation models as tools to represent the causal relationships between the variables ( Pearl ( 20091 )). 



If we intervene a part of the causal structure, then the overall causal structure changes. By using 
the structural equation model that represents the correct causal relationships, we can evaluate the 
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amount of change caused by the intervention. This m eans that we can compute th e optimal inter- 



ventio n me thod to n iinirn ize the variance of a variable. iKuroki and Mivakawal (|2003l ) . iKuroda et al. 



(|2006l ^ and iKuroki l (j2008l ) evaluated the intervention effect for the variance of a variable and give 
some methods to obtain the optimal intervention that minimizes the variance. However, it is diffi- 
cult to use their methods in practice because they implicitly uses the impractical assumption that 
the intervention can be made freely without any constraint (e.g. we may have a bound for an 
intervention by changing a parameter of a structural equation because of the cost to change it). 

In this paper, we formulate the problems to obtain the optimal intervention that minimizes the 
variances and to adjust the expectations as convex quadratic programmings. This formulation en- 
ables us to easily impose boundary conditions for interventions. To this purpose, we first introduce 
some ideas of decomposition of total effects in Section [2j Note that the term "decomposition of 
total effects" means not only decomposition of total effects into direct and indirect effects, but also 
decomposition by paths or set of variables. We also explain that the expectations and variance 
matrix can be decomposed into several parts in terms of the total effects. In Section [31 we show 
that the problem to obtain the optimal intervention that minimizes the variances can be formulated 
as a convex quadratic programming. We also treat a problem to adjust the expectations. Next, 
in Section 21 we show how the proposed algorithms given in Section [3] work by using a toy model. 
Finally, we give some discussion in Sectioi^S) 



2 Decomposition of total effects and Interventions 

First, in Section [2.11 we briefly mention structural equation models and path diagrams, and then 
introduce some notations. Next, in Section [2.21 we introduce matrix representation of total effects 
and their decomposition. The idea of decomposition of total effects is very important to consider 
the optimal intervention which we will treat in Section [3j Finally, in Section 12. 3[ we explain the 
interventions to the structural equation models. 

2.1 Structural Equation Models 

The models that the relations among random variables are described in terms of linear equations 
are called structural equation models. To give some explanations about terms and notations, let 
us consider an example of structural equation model. 

Example 1. Assume that six random variables Ti,T2, Xi, X2, Si and S2 are generated by the 
following linear structural equations: 

^ = /^t;pa(t) + £t;pa(t) ; 

-^2 = /^X2;pa(x2) ~^ Oix2tT + 01x2x1X1 + ex2;pa(a;2)) 
Si = /^si;pa(si) "I" OisixiXi + esi;pa(si)) 

'S'2 = /^S2;pa{s2) + cts2tT + as2X2^2 + as2Si5'i + es2;pa{s2)' 

where: 

• /^ti;pa(ti), • • • ,/^s2;pa{s2) the intercepts; 

• Osit, ■ ■ ■ ,Ois2Si are proportionality coefficients called path coefficients; 
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• et;pa(i), • • • , es2;pa(s2) are the error terms. 
We will soon explain the meanings of subscripts such as X2;pa(x2). 




Figure 1: An example of path diagram 

In the above equations, we presume that each left-hand side is determined by the right-hand 
side, i.e. right-hand sides are causes and left-hand sides are the results. If we represent a causal 
effect by an arrow with its path coefficient, then the relations among the random variables T, Xi, 
X2, Si and 5*2 can be graphically represented as Figure [TJ This graph is called the path diagram. 

The arrow from T to X2 means presumed direct causal effect from T to X2. For this arrow, 
T is said to be parent of X2. Conversely, X2 is said to be child of T. These are graph theoretic 
terms. Here, X2 has two parents T and Xi, and we denote them by pa(x2) as an abbreviation for 
parents of X2. Furthermore, we denote by ;pa(x2) removing the effect of pa(x2). Thus, Mx2;pa(x2) 
represents the mean of X2 when the effects of the parents of X2 are removed. We also use terms 
ancestor and descendant as graph theoretic terms. For example, the ancestors of Si are Xi and T, 
and the descendants of Xi are X2 , and 52 . □ 

We now formulate the general structural equation model in a way so that it is easier to use for 
the calculations of total effects, means and variances which we will treat in Section 12.21 Consider 
a random vector V the elements of which are generated by linear structural equations. We divide 
the random vector V into three disjoint parts: T, X and S, so that the elements of T are the 
ancestors of some elements of X and the elements of S are not the ancestors of some elements 
of X nor some elements of X themselves. This decomposition is uniquely determined if once we 
choose X C V. 

Now, a structural equation model can be represented by using vectors and matrices as follows: 

/Mt;pa(t)\ /Att Otx Ots\ /T\ /et;pa(t)\ 
^1 = 1 t^x;pa{x) 1 + I Axt Axx Oxs 1 I ^ I + I ^x;pa{x) I • (1) 
Sj \/^s;pa(s)/ \Ast Asx Ags ) \S ) \^s;pa.{s) / 

Here, fJ-t;pa{t)^ lJ-x;pa{x) /^s;pa(s) ^re the means of T, X and S, respectively when the effects 
of their parents are removed; Au, Axt, ■ ■ ■ , Agg are the matrices which consist of the path co- 
efficients; and ej.pa(t), ex-;pa(a;) arid es;pa(s) are the error terms. We assume that the means of 
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et;pa(t)>ea;;pa(a:),es;pa(s) are all zero values and et;pa{t), ea;;pa(x), es;pa(s) have the variance matrices 
5^tt;pa{t)) ^a;a;;pa(a;) Sss;pa(s) respectively. Furthermore, to avoid cycles in the structural equa- 
tions, we assume that the elements in diagonal and upper triangular portion of the coefficients 
matrices Au, A^x and A^g are all zero values. This formulation is possible by sorting the vari- 
ables by their parent-child relations whenever the structural equations do not contain cycles. For 
example, the equations in Example [1] can be formulated in the form of ([1]) by letting T = {T}, 
X = {Xi, X2} and S = {S*!, S'2}, where the matrices of the path coefficients are as follows: 



Au = Q , Axt 





4 — 








" ) 


V 









■^st — ( I 5 Agx — I n ^ 1 ' -^ss — I ri 

\as2tj V Ois2X2j \as2si 



2.2 Total Effects, Means and Variances 



For a given structural equation model, the total effect from a variable Vi € V to a variable V2 & V 
which is one of the descendants of Vi is defined as the change in V2 that is produced when Vi is 
increased by 1 and all error terms are fixed to 0. Therefore the total effect from Vi to V2 is equal 
to the derivative of V2 with respect to Vi for the structural equations eliminating all error terms. 
The direct effect from Vi to V2 is defined as the path coefficient from Vi to V2 and it coincides 
with the partial derivative of V2 with respect to Vi for the structural equations eliminating all error 
terms. The indirect effect from Vi to V2 is defined as the total effect minus the direct effe ct. For 



the pr e cise and gener a l defin i tions of th e terni s such as direct, indirect and total effects, see iBollen 
(|l987l ^. lBolle^ (l989l ^. Is^^ (|l99nl ') and lPearll jioO^). Let us consider the following example. 



Example 2. In Example [H the total effect from T to 5*2 is calculated as follows. 

We obtain the following equations by eliminating all error terms in structural equations in 
Example [TJ 

T = fJ-t;pa.(t) 
^1 = ^J'Xl■,pa{xl) + O^xitT 
X2 = ^J'X2;pa{x2) + '^X2tT + ax2XiXi 
5*1 = fJ-si;pa.{si) + OisixiXi 

S2 = fJ'S2;pa(s2) + '^S2tT + 082X2^2 + Ois2Si Si 

From the above equations, we obtain the following relation between S2 and T when all error terms 
are fixed to 0. 

S2 = I^s2;pa{s2) + (^S2tT + as2X2X2 + Ois2SiSl 

~ /^S2;pa(s2) ^S2tT + Ols2X2{l^X2;pa.{x2) ~^ '^X2tT -\- 0!x2Xi-^l) + '^S2Sl (/^si ;pa(si) ~l" C^sixi^l) 
~ /^S2;pa(s2) ~l~ ^S2tT -\- Os2X2{f^X2;pa.(x2) ~^ ^X2tT + ^X2Xi{l-'-x-i;pa.{x-i) + ^x\tT)} 

+"s2Si{Msi;pa(si) + "sixi (/^zi;pa(zi) + Olx^tT)} 
~ I^S2;pa.(s2) ~^ ^S2X2l^X2\pa.{x2) '^S2Si/^si;pa(si) + {ci.S2X2^X2Xi + Cls2Si Q^sixi ) A'-xi ;pa(a;i) 

~\~{,0/.S2t ~l~ Cks2Si O^sii'i O^a'it ~l~ Ots2X2^X2XiOixit ~\~ CXs2X2^X2t)T 

Therefore the total effect from T to 5*2 is equal to as2t + Ois2SiOisixiCixit + Cis2X2'^x2xiOtxit + Cis2X2'^x2t- 
The total effect can be decomposed into direct and indirect effects. First, the direct effect is as2t 
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which is the path coefficient of T 82- The remainder as2SiO(sixiO(xit + C(s2X20ix2XiCtxit + cts2X2'^x2t 
is the indirect effect and the terms as2si<^sixi<^xit, ctsixiCix'^xxOLx'^t and as2x-2(^x-2t correspond respec- 
tively to the effects of the paths T — )• Xi — > — > 52, T ^ — X2 — > 5*2 and T ^ ^ S2 
from the front. □ 

Let us denote the total effect from Vi € V to 1^2 S ^ by Tv^v^ ■ Furthermore, let us denote the 
matrix of the total effects from C/ C V to C V by t^u where U CiW = fj) and (i, j)-element of 
Twu is the total effect from Uj £ U to Wi € W . 



Proposition 1. (jBollen (Il987l ). ISobell (|l990l )) Assume that the structural equations for V are 



written in the equation V = /^„;pa(i)) + A^vV + ^v;pa.{v)- Furthermore, we assume that {1^^ — Ajjjj) 
is invertible where I^v is the identity matrix. Then the matrix of the total effect r^^ is given by 

^VV i,-^VV ^Wf) Ayy . 

Note that {1^ — Tw)'"^ always exists in the model of ([1]). Intuitively, the elements of A^^ 
represents the direct effects and the elements of A^^ represents the indirect effects through one 
variable. In the same way, the elements of A^^ can be considered as the indirect effects through 
n — 1 variable. Therefore, the total effect is equal to A^^ + A^^ + A^^ + • • • = {Iw — Ayy)~^Ayy and 
the above proposition holds. 

In the next example, we treat a decomposition of a total effect and introduce some useful 
notations for the calculations of means and variances of V which we will treat later in this section. 

Example 3. Assume that six random variables Ti, T2, Xi, X2, Si and 5*2 are generated by the 
following linear structural equations: 

Tl = /J'ti;pa(ii) + eti;pa{ti) 

^2 = /^t2;pa(t2) + "t2ti^l + et2;pa(t2) 

^1 ~ A''a;i;pa(xi) + CHxitiTl + ea;i;pa(a:i) 

^2 = A''a;2;pa(x2) ~^ 0^X2*1^1 + 0^X2*2^^2 + 01x2X1^1 + £x2;pa{x2) 

Si = /^si;pa{si) + "siti^l + Q-sixiXi + esi;pa(si) 

'S'2 = /^S2;pa{s2) + "52*1 2^1 + "52*2^2 + Ols2XiXi + as2X2X2 + as2SiSl + es2;pa(s2) 

The path diagram of the above linear structural equations is given in Figure [21 

The above equations can be formulated in the form of ([1]) by letting T = {Ti, T2}, X = {Xi,X2} 
and S = {81,82}, where the matrices of the path coefficients are as follows: 








Otx2tr 


0^2:2*2 


O-SlXl 





C^S2Xi 


01.32X2 



1^0^52*1 Ois2t2j \0!.S2X\ OLS2X2J \S^S2S\ U 

In this model, the total effect from T\ to S'2 is calculated as follows: 

'^S2t\ — '^82*1 ~l~ Cts2Sl '^^Sjtx ~l~ Cks2t2'^t2il ~^ OLs2X\OLxxt\ ~l~ Cls22;2 '^2:2*1 

~^OLs2X20^X2X\Oix]t\ ~^ Ois2S\OI-s\X\Oix]tx ~l~ Cls2a;2 '^2:2*2 '^42*1 ■ 
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Figure 2: An example of path diagram with six variables 
Furthermore, the total effect from Ti to ^2 is decomposed into the following eight paths: 

> ^2 > 

ii Ai y b2, 

T,^X,^ S2, (2) 

Jl Ai )■ Dl > D2, 

rp "*2tl rp ^^2t2^ V "=2^2^ r. 
il 7- 12 > A2 J2- 

In the above paths, only the first path Ti °^'^> S2 represents the direct effect with the value of as2ti 
and the other paths represent indirect effects with the values of Qsjsiasitu 052*20^*2*1; cts2XiCtxiti, 
0-s2Xi^xitii Ois2X2^x2tn Ois2X2^x2XiOixiti tirid 0!s2SiO!sixiO!xitn (^s2X2^x2t2^t2t\ respectively. 

Now, we decompose the total effect from Ti to S2 into the following two parts. 

1. Let us denote by Ts2tiiT X ^ S) the total effect from Ti to ^2 through X. Because the 
last five paths in ^ go through Xi and X2, we obtain 

"^"52*1 (-^ ^ ^ — Cts22;i Oxjtx ~l~ Ois2Xi0^xiti ~l~ Ois2X2^X2ti 

~^OLs2X2^X2XlCy.Xltl ~^ Cls2SlC!tsiXlClix\tl ~l~ 0^2X2 0x2*2 Oi2*l • 

This is equal to the total effect from Ti to 5*2 in the model of Figure [3l 

2. Let us denote by Tgjt^ (T S) the total effect from Ti to ^2 when the effect of X are removed. 
From the above decomposition, the first three paths in ([2]) do not go through Xi and X2. 
Therefore, we obtain 

Ts2ti(T S) = as2ti +Os2siOsiti + as2t20it2ti- 
This is equal to the total effect from Ti to 5*2 in the model of Figure HI 
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Figure 3: A path diagram when the direct paths from T to S are removed. 




7 



Next, let us consider the following two matrices 

TstiT^X^S) iIss-Ass)-^As.{I..-A^^)-'AMt-Att)-\ 
TstiT^S) {Iss-Ass)-'Ast{Itt-Aur\ 
where ht^Ixx and Iss are the identity matrices. Then we obtain 
Tst{T ^X^S) 



1 


















j 









S2Xl 







a 



S2X2 



-a 



sixx 
S2XI 



a 



S2X2 



a 



X2X\ 



X2Xi 





axiti 



1 

-"tail 1 



a. 



Xltl 







(^X2tl Oix2t2 J \^t2ti 1 



1 







O^sixi Oixiti 
(^S2X\0ix\t\ ~\~ Ois2XiOix\ti ~\~ Ois2X2^X2tl 



+a 



S2X2^X2Xl 



lOixitl ~^ Ois2Sl(^SlXlO!xitl ~^ 0:S2X2^X2t2^t2tl 



C>-S2X2^X2t2 



and 



Tst{T ^ S) 



1 



-Q 



S2S1 



-I 



1 



a 



S2S1 



Cts2il 0^52*2 



1 

at2ii 1 
1 

Oit2tl 1 

(^S2t\ ~l~ Q^S2Si Q^Siti ~l~ Cts2t2'^i2il '^82*2 

Note that the (2, l)-elements of Tst{T X ^ S) and Tst{T — )• S), which corresponds to (S'2,Ti), 
are equivalent to Ts2ti{T X ^ S) and Ts2ti{T — )• S). This equivalence can be justified as 
Theorem [TJ □ 

As in Example [3l we define the following two matrices for the model of ([1]) : 

dcf 



T,tiT ^X^S) =• ihs - A,s)-^A,Mxx - A,,)-^AMt - Au)-\ 

def. 



(3) 

TstiT^S) =• iIss-Assr^AstiIu-Au)-\ (4) 
where Itt,Ixx and Iss are the identity matrices. The next lemma can be shown by direct calculation. 
Lemma 1. Let B be a square matrix which can be represented as follows: 

















B22 







K B31 


B32 


B33 



where Bu, B22, B33 are square matrices. If Bu, B22, B33 are non-singular matrices, then the fol- 
lowing equation holds for the inverse matrix of B. 



B 



11 



B 



21 



\B: 



31 



o 



B- 



22 



B 



32 



o 



o 



B 



-1 



B' 



'11 

-B22 -621-6^1 



33 



\ ^33 B32B22 B2iB^^ - Bnn B31B 



'11 



O 



B. 



22 



-B33B32B22 



O 



B 



O 



33 



□ 
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In the next theorem, we obtain the matrix representations of total effects from T to X, from 
X to S and from T to S, and justify the decomposition of the total effect which is treated in 
Example O 

Theorem 1. 

Txt = [{I-A)-^AU = {Lx-Axxy'AxtiItt-Au)~\ (5) 

TsX = [{I ~ A) A\gx = {IsS ~ Agg) Agx{lxx ~ Axx) ) (6) 

Tgt = [{i-Ar'A]gt 

= {Igg - Agg)-^Agx{Ixx - Axx)~^Axt(,Itt - Att)-^ + - Agg)-'^Agt{Itt - Au)-^ (7) 
= Tgt{T ^ X ^ S) + Tgt{T ^ S) (8) 

where [{I — A)^^A]uw for U, W & V is the submatrix of (/ — A)^^A corresponding to the rows of 
U and the columns of W . 



Proof: By letting Bn 



Att , B: 



21 



-Axt , B22 



Ix 



A 



XX 1 



-A 



St , -B32 



-A 



sx , -B33 



Igg — Agg in Lemma [H we obtain 



{I-A)-^ = 


^ {Itt - Att)-' 








{Ixx — Axx)~' Axt{Itt — Att) ' 


{Ixx Axx) 







\ A*g, 


{Iss Agg) Agx{Ixx Axx) 


{Iss Agg) 



where 



A*g^ = {Igg - Agg) 'Agx{Ixx-Axx) 'Axt{Itt- Att) ' + {Igg -Agg) 'Agt{Itt - Att) ^■ 

By using the definitions of Tgt{T — X — > S) and Tgt{T S) in ^ and ([!]), and the identity 
(/ — C)~'C + I = {I — C)^' for non-singular matrix C, we obtain 



{I- A)-' A 





{Itt 


-AttY^Au 










-^xx , 


-'A^ht-Au)-^ 


i^^xx 


A A 

-^xx ) -^xx 





Tst{T- 


X 


S) + Tst{T ^ S) 




iJ A 

-^sx y-'-xx -^xx J 


{Iss- 





Therefore, we obtain the matrix representations (0), ^ and d?]), and the decomposition Tgt = 
Tgt{T ^X^S)+ Tgt{T ^ S). □ 
Note that 



"^tt {Itt Att) Att 1 '^XX {Ixx Axx) Axx ) TsS {Iss Agg) Agg (9) 



are also obtained from the proof of Theorem [U and they are also obtained from Proposition [TJ 
Furthermore, note that, for example, the matrix of total effects Txt needs both the premultiplication 
of {Ixx — Axx)~' and the postmultiplication of {Itt — Att)~'- This is a thing that is different from 
the result of Proposition [TJ 

Next, we calculate the means of T, X and S. From structural equation model ([T]), we obtain 
the following equations: 

{Itt — Att)T = /Xt;pa(f) + et;pa{t) , 
{Ixx ~ Axx) X — f-''x;pa{x)AxtT' -\- €x-^ps^(^x): 
{Iss ~ Ags)S = /^s;pa{s) + AgxX + AgtT + es;pa{s)- 
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By multiplying both sides of the above three equations by inverse of {lu — ^tt), (Ixx — ^xx) and 
{Iss — Ass) respectively, we obtain the following equations: 

T = (/tt-^tt)~Vt;pa(i) + (/ti-^tt)"^et;pa(t), (10) 
X = {Ixx ~ Axx) ^ fJ'x-.pa.ix) + {Ixx ~ Axx) ^ ^xtT + {Ixx ~ ^xx) ^£x;pa.{x)i (H) 
S = {Iss ~ -Ass) /-'■s;pa(s) ~^ {Iss ~ Ass) AsfT + {Iss ~ Ass) AsxX + {Iss ~ Ass) ^s;pa(s)- 

(12) 



By taking the means of both sides of (jlOp . (jlip and ()12p . we can compute the means of T, X and 
S, and obtain the following proposition. 



Proposition 2. 



E[T] = {Tu+Itt)fJ^t;p.it), (13) 
E[X] = TxtfJ-t;psi{t) + {'''xx + Ixx)tJ'x;p£i{x), (14) 
'E'[S] = TstlJ't;pa.(t) + '^sxt^x;pa.(x) + ("^ss + Iss)l^s;pa.{s) 

(15) 



Proof: By taking the means of both sides of (jlOp and using Q, we obtain (|13p as follows: 

E[T] = (/« - Att)~Vt;pa{t) = {Itt - Att)'^{Att + {ht - Att)}^lt■,p!,{t) = {nt + Itt)tJ-t;p4t)- 

Next, by substituting pO]) into pT]) and taking the means, we obtain as follows: 

'^[■^] = (-^xx — ^xx)~"^^a;tE[T] + — y4a;a;)~-'-/X2,.pa(a;) 

= {Ixx ~ Axx) Axt{Itt ~ Att) Mt;pa(i) + {Ixx ~ Axx) fJ'x\pa.{x) 
= '^xtt^t;pa{t) + (Tt's + Ixx) fJ'x;pa{x) ^ 

where we are using ([5]) and ([9]) in the third equality. 

Finally, by substituting (|10p and (jlip into (jl2p and taking the means, we obtain (|15p as follows: 



E[S] — {Iss — Ass) ^ Asx{{Ixx — Axx) ^ Axt{Itt — Att) ^ tJ't;pa.{t) + {Ixx — Axx) "^Ma;;pa(x)} 
+ {Iss - Ass)"'^ Ast{Itt - Att)^^ Ht;pa.{t) + {Iss - Ass)~'^ fJ-s-pais) 
= {rst{T ^ X ^ S) + Tst{T ^ 5)}/Xt.pa{t) + rsxfJ'x;pa(x) + {'^ss + Iss) fJ's;pa.{s) , 
= "TstfJ-t-pait) + '''sxt^x;pa.{x) + {'''ss + hs) t^s;pa{s)^ 

where we are using (l3|) , , (l6|) and dH) in the second equality and using ([8]) in the third equality. □ 
The above proposition says that the means can be decomposed by means of the total effects. 
Almost the same things can be said about the variance matrix of T, X and S. 

Proposition 3. Assume that Cov[T, e^.pa(^.)] = Cov[T, e^.pa(s)] = Cov[X, e^.pa(5)] = O. 

Y[T] = {Ttt + Itt)^tt;p.it){rtt + Itt)'', (16) 

\[X] = Txt^tt;pa.(t)'^xt + {"^xx + Ixx)^xx\pa.{x){Txx + Ixx)'^ , (17) 
Y[S] = rst^tt;pa.{t)'^Tt + sx^xx;pa.{x)'''L + {'''ss + Iss)^ss;pa.{s) {^ ss + Iss)'^ (18) 
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Proof: 

From (|10p . we obtain ()16p as follows: 

V[T] = (/,,-^,,)-iv[et.pa(t)](/«-^«)-^ (19) 
= {ht - AttY^Au + {ht - Au)}\[et,^.^it)]{Att + {ht - Au)}^ {ht - AuY^ 
= (tu + ht)'^tt;p&{t){nt + htf 

where we are using ([9]) and V[et.pa(j)] = ^tt\\,a{t) i^i the third equality. 

Next, from ([TO]) . ([TT]) . (fT9]l and the assumption Cov[T, e2,.pa(2;)] = O, we obtain (fT7|l as follows: 

= {Ixx ~ Axx) Axt^\T\A^-f.{^Ixx ~ Axx) + {Ixx ~ Axx) '^[^x;pa,(x)]{-^xx ~ Axx) 

= {{Ixx - Axx)~^Axt{Itt - Att)-^}\[et.^^(t)\{{Ixx - AxxT^Axtiht - AuY^}'^ 

~\~{Ixx ~ Axx) '^[^x\pa.(x)\{-^xx ~ Axx) (20) 
= '^2:i^tt;pa(i)Trt ~^ {'''xx + Ixx)'^xx;pa.{x)i'^xx + Ixx) ; 

where we are using ([5]), Q and V[e^.pa(a;)] = ^xx;psi{x) in the third equality. 

Finally, from the assumption Cov[T, e^.pa(<j)] = Cov[X , e^.p^^^^-j] = O, we have 



Y[S] = {Iss - A,s)-^AstY[T]A^st{Iss - Ass) 

-T 



-T 

+{Iss - Ass)-^AstCoY[T,X]Ajx{Iss - Ass)-^ + {Iss - Ass)-^AsxCoy[X,T]AI{Iss - Assj 

+ {Iss - Ass)-'AsxY[X]A^x{lss - Ass)-"^ 

+{Iss - Ass)-'\[es.p^is)]{Iss - Ass)-^. (21) 
Now, from ^ and ([II]), Cov[T,X]^ = Cov[X,T] can be calculated as follows: 

Cov[X, T] = {Ixx - Axx)-^Axty[T] = {Ixx - Axx)^^Axt{Itt - Au)Y[et.p,^t)] (22) 

Therefore, by substituting (US]), ^ and 1^ into (gID, we obtain (US]) as follows: 

Y[S] = {{Ls - Assr^Astiht ~ Aur^}Y[et.,p,^t)]{{Iss - yl..)"'At(/« - Au)-^}'^ 

+{{I,s - Assy'^As^ih^ - A^.^y^ A^t{Iu - Att)}V[et.,p^^t)]{{Iss - Assy^Astiht - Au)'^}^ 
+{{Iss - Ass)"'^ Ast{Itt - Atty^}V[et.p^^t)]{{Iss - Assy^ As^ih-^ - A^^y^A^t{Itt - Au)}^ 

+{(iss - Assy^Asxiixx - Ax^y^ Axtiht - Att)~^}v[et.pa(t)]{(/ss - Assy^Asx(ixx ~ Axxy^ Axt{itt ~ Atty^y 

+ {(-^ss Ass) Asxi^^x Axx) }V[6^.pa(x)]{(-^ss Ass) Asxi^Ixx Axx) } 
+ {Iss - Ass)~^'^l[es:,ps.(s)]{Iss " AssY 



;;pa(x)JH^ -'ss -^ssj -^sxK^-l-xx ^xx ) 

T 

ss ) 

T 

± ;S)Z.tt;pMt)Tsx[-l Ji. ^ 

5ji-tt;pa(t)T"sti-l ^ ^ Jj^ 

T 



Tst{T ^ S)E«.p,(t)T,t(T ^ S) 



+ Tsx{T -^X ^ S)-Eu;p.(t)Tst{T ^ S)'^ + Tst(T ^ 5) Sii.p,(t) T.^T ^ X ^ 5)' 

+T,t(T ^ X ^ S)E,,.p,(t)T,t(T ^ X ^ 5)' 

I T 

+ (Tss + -/'ss)Sss.pa(s) (Tss + Iss) 
— '^s^^ii;pa(^)'^s^ + '''sx 5]a^3;;pa(3;) ''"sx + {'Tss + -^ss)5]ss;pa(s) (Tss + Iss) , 

where we are using ([3]), ([3]), (P, ([6]), ([9]), V[ei.pa(t)] = i;tt;pa(t), V[e^.pa(:r)] = ^xx;pa.{x) and 
V[es.pa(s)] = S55.pa(s) in the second equality, and ([5]) in the third equality. □ 
In the following, we only consider the case where the assumption of Proposition [3] holds, i.e. 
Cov[T,e^.pa(^)] = Cov[T,e,.pa(^)] = Cov[X, e^.pa(^)] = O. 



11 



2.3 Interventions to Structural Equation Models 



An intervention to a structural equation model means changing structure of the structural equation 
model. Throughout this paper, we consider only intervention to the st r uctur es between T and X 
in the model of ([1]), (for more general case of intervention, see IPearll ^200^ )). In this case, only 



the elements of X are directly affected by the intervention and are called treatment variables. Of 
course, the elements of S are also affected indirectly by the intervention. The elements of T are 
called covariates and the elements of S are called output variables. The effects caused by the 
intervention are called intervention effects. For example, the changes on the means of the output 
variables S after the intervention are intervention effects. 

Assume that At^;pa(x), ^xt and e^.p^(^^) in are changed into fix;ps,(^x), ^xt and e^;pa(x), re- 
spectively, by the intervention, where ea;;pa(x) is the column vector of error terms that their means 
are all zero values and the variance matrix is 'Sxx;pa.{x)- Furthermore, we assume that the assump- 
tion of Proposition [3 again holds after the intervention, i.e. Cov[T, e^.pa(2^)] = Cov[T, e^.pa(s)] = 
Cov[JC, e5.pa(s)] = O. Then the structural equation for X is changed from 

~ f^x;pa{x) ~l" -^xxX -\- AxfT -\- ex;pa{x) 

to 

^ = Aa;;pa(x) + ^xxX + A^tT + e^.pa(x)- (23) 

Let us define the following matrices. 

fst{T^X^S) {Iss-Ass)-'Asx{Ixx-Axx)-'Axt{Itt-Au)-\ (24) 

fst fst{T^X ^S) + Tst{T^S) (25) 



The elements of fst are the total effects from T to S after the intervention of (j23|) . Note that 
Tst{T — >■ S) does not change after the intervention of ([23]) . 

Let us denote by £[5] and V[S'] the means and the variances of S after the intervention of 
Then the following corollary holds immediately from Proposition [2] and [3l 



Corollary 1. After the intervention of h23\). the mean vector of S is given by 

E[S] = fstfit;pa.{t) + '^sxP-x;pa.{x) + {^ss + hs) fJ^tipa^t) ■ 

Furthermore, assume that Cov[T , €^.p^(^^-^] = Cov[T, e5.pa(s)] = Cov[X, eg.pa(s)] = O, then the vari- 
ance matrix of S after the intervention of [23\) is given by 



V[S] — fst'^tt;pa.{t)^It + '^sx'^xx;pa{x)'''Ix + ("^ss + Iss)'^ss;pa{s){'''ss + hs)^ ■ (26) 

In the following sections, we treat only the intervention by which the error terms of X do not 

change i.e. 'Sxx;pa{x) = ^xx;pa(x)- 



3 Application of Mathematical Optimization Procedures to Inter- 
vention Effects 

In Section 13.11 we first consider intervention to the path coefficients fst to reduce the variances of 
the output variables. Next, in Section 13.2] we treat intervention to the means iix;pa{x) to adjust 
the mean values of output variables. 
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3.1 Application of Mathematical Optimization Procedures to Intervention Ef- 
fects for Variances 



For a given structural equation model, it often happens that it is necessary to intervene the model 
to reduce the variances of the output variables. In structural equation models, this can be done by 
changing the values of the path coefficients Tgt by intervention. In this section, we show that an 
algorithm to obtain the intervention method which minimizes the weighted sum of the variances 
can be formulated as a convex quadratic programming. This formulation allows us to impose the 
boundary conditions for the intervention, so that we can find the practical solutions. 

Let us denote by Y elements of interest in , by the dimension of Yi, and by Yi the i-ih. 
element of Y . The variance of 1^, which we denote by V[l^], is the diagonal element of V[S'] in 
relation to 1^. Then the minimization of the weighted sum of the variances of Y , under constraint 
that the elements of A^t have upper and lower bounds can be formulated as follows: 



Minimize 



subject to Al < Axt < A 



u- 



(27) 
(28) 



where ki, . . . , are the weights, and Al and Ajj are the matrices, the elements of which are the 
lower and upper bounds for Axt- We assume that these values are determined appropriately in 
advance. 

Now we formulate the above problem as a convex quadratic programming. 
At first, we neglect the terms Tsx^xx;pa.{x)'T'Tx and {tss + Iss)'>^ss;pa.(s){'Tss + hsV in the variance 
matrix of (j26p . because they are not changed by changing Axt- Let us define the following functions: 



fyi{Axt) 



def. 



where r, 



'''yit'^tt;pa.{t) 



X ^ S) and Ty^t{T — S) are the row vectors of Tst{T 



-^S)+Ty^tiT^S)}^, 

(i = l,...,ny), (29) 
X ^ S) and Tst{T S) 



in relation to Yi. Then the minimization of the objective function in ()27p is equivalent to 

l^ifyi{Axt)- 



Minimize 



E 

i=l 



Remember that the definition of Tst{T X ^ S) is fst(T X —?■ S) = {Iss — Ag 



Axtiitt - Att 



1-1 



m 



By using vec operator, Kronecker product (8 and 



Asx {Jxx 

(see Ap- 



pendix [A]T]), the column vector Ty^t in ()29p can be formulated as follows: 



'y^t 



{fy^t{T ^X- 



S) + Ty^t{T^S)f 

Asx{l^xx ~ Axx) Axt{Itt 



Att)-'Y 



vec 



[{Is 



Asx{IxX Axx) Axtijtt Att) 



+ ryAT^. 
+ 'ryAT 



{{Itt-Att) ^ {[{Iss - Ass) \sAsx{Ixx - Axx) ^}]-vec{Axt) + TyAT 



{{It, 



At 



11^ 



Yec{Axt) + Ty^T ^ Sy 



S)\ 
(30) 
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where [{Iss — ^ss) ^yis is the row vector of {Igs — Ass) ^ in relation to Yi, and r^-^ is the row vector 
of Tsx in relation to (see ([6]) of Theorem [1]) . Let us define the following matrices and column 
vectors: 

Qi {{Itt - Att)^^}^ ® Ty^x 

= {{la - Att)-^}^ O {[{Iss - Ass)'%,sAsMxx - A^x)-^} , (i = 1, 

def. 



7 =• vec(^^i), 

def. /_, rr/T A \— In A /T A \— 11^ 



L, . . . , Uy) 



Ty^T ^ Sf = {[{Iss - Assr\sAst{Itt - Att)-^} , {i = h 
By using these definitions and (pOj) . the column vector r^.j in (j29]) can be represented as follows: 

fy,t = Qrl + (31) 

Hence, we obtain 

fyMxt) = {Qa + rif^tt;p-a{t){Qa + ri) = 7'^ {QT^tt;pa{t)Qi) 7+ {^rfj:tt;p-ait)Qi) l + rf^tt;p^{t)ri, 

and 

^ Kify^{Axt) = 7^ I X] '^iQl^tt;pi,it)Qi ) 7 + I X] Sfj.pa(i)Qi ) 7 + XI '^i^T^tt;pii{t)ri 

i=l \i=l / \i=l ) i=X 

Note that the third term in the right-hand side of the above equation is constant with respect to 
7 and negligible in the minimization problem of ()27p . Therefore, the minimization problem of ()27p 
under the constraint of ()28p can be represented as the following convex quadratic programming: 



Minimize '^iQl^tt;p„{t)Qi^ 7 + ^Kirf Sit.pa(i)Qi j 7 

(32) 

subject to ctL < 7 < etc/- 

where '==' vec(^L) and cxu ^= vec(^{/). 

The Karush-Kuhn- Tucker (KKT) conditions of the problem of ()32p are given as follows: 

I Uy \ / ny \T 

I X 2KiQfT,tt.p^(^f)Qi I 7 + I X] 2Kjrf Ett.pa(t)Qi | - + ^c/ = 0, 

<^L > , <^C/ > 0, 

-7 + < , 7 - «c/ < 0, 
0I(-7 + o;l)=O , 0^(7 - Q![/) = 0, 

where the elements oi 4>l and <^(/ are Lagrange multipliers (for more detail see iRockafellail (jl99fil )1. 
Assume that the constraints in (|32p satisfy Slater's constraint qualification, i.e. oll < olu holds. 
Then 7 is optimal if and only if there exist cf)i and (pu which satisfy the above Karush-Kuhn- 
Tucker conditions for 7- Notice that even if {qlIj = {f^u}i holds for some i's, the constraints in 
()32p satisfy Slater's constraint qualification by considering the inequality constraints as equality 
constraints. 
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Example 4. Let us consider a case where Y = {Yi}, Ty^^ = [{Igs-Ass) %^sAsx{Ixx- ^ ^ 0, 
^tt;pa(t) is regular, and constraint is not imposed on A^t, i-e. 

4)1 = , cl)u = , OLL ^ -oo , Ctu ^ oc. 

Then the Karush-Kuhn- Tucker conditions in this case are given as fohows: 

^ Qj^U;pe.{t)iQll + ri) =0 

^ <3i7 + ri = {Qi is regular from the assumption Ty-^^x 7^ 0) 

^ "Ty.t = TyAT ^S)+ Ty^t{T ^S) = 0. (See §11, and ([25]).) 

Remember that the first term fy^t{T X ^ S) m. the last equation means the total effect from 
T to Yi through X after intervention and the second term Ty^t{T — )• S) means the total effect 
from T to Yi which does not go through X. Therefore, if the total effect from T to Yi through 
X after intervention offsets the total effect from T to Yi which does not go through X, then the 
Karush-Kuhn- Tucker conditions hold and the variance of Yi is minimized. □ 

3.2 Application of Mathematical Optimization Procedures to Intervention Ef- 
fects for Means 

We consider the intervention to the means /ix;pa(x) • From proposition [21 we obtain the mean of Yi 
as follows: 

E[Fj] = TyitfJ't;pa{t) + '''yiXp'x;pa.{x) + [{'''ss + Iss)]yisfJ's;pa{s)^ 

where [{tss + Iss)]yis is the row vector of {tss + hs) in relation to Y^. 

Suppose that we want to adjust the mean of Yi to a standard rrii by intervention which 
changes fix;pa.{x)- Then the minimization of weighted squared sum of the deviations (E[Yi] — 
mi), . . . , (E\Yny]—mny), under constraint that the elements of ij-x;pa{x) have upper and lower bounds 
can be formulated as follows: 

ny 

Minimize Xi{E[Yi\ - rmf (33) 

subject to /iL < p-x;pa{x) < fJ-u- (34) 

where Ai, . . . , Xuy are the weights, and hl and fxu are the matrices, the elements of which are the 
lower and upper bounds for /ix;pa(z) • We assume that these values are determined appropriately in 
advance. 

From Proposition [21 we obtain 

(E[Yi\ - rriif = {{ry-tllt-pa{t) + '^yiXp-x;pa{x) + [{Tss + ^ss)]yisMs;pa(s)) " 
~ l^x;pa{x)\'^yix'^yix)^^x;pa{x) 

+ [^{{'^yit^^t■,pa{t) + [{Tss + Iss)]yistis;pa{s) " mjTyix] Ax;pa(x) 
+ {Tyjt/i.t;pa(f) + [(Tss + Iss)]yistJ's;pa{s) ~ ^i}"^ 
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Note that the third term of the last equation does not depend on p-x-pa.{x) only the first and 
second terms are needed for the minimization in ()33p . Therefore, the minimization problem of ()33p 
under the constraint of (j34p can be represented as the following convex quadratic programming: 

Minimize /ijpa(x)('r^^x'r?/>x)/ix;pa(x) + [HiTy^t^^t■,p^.{t) + [(^ss + ^^)]yis/^s;pa(s) - m}Ty^x] /ix;pa(x) 



subject to 

4 Numerical Experiment 



Ml < P-x;pa.{x) < fJ-U- 




(Number 
advertising 
campaign J 





1 




" 10, 


r 



Figure 5: The path diagram of the structural equation model of ()35p . 



To illustrate how the two algorithms in Section [3] work, we consider the following toy model. 
The model used in this numerical experiment is just a toy. It may contain some inappropriate 
formulations and should not be taken seriously. 

Suppose that an editor of a journal which is published once a year wanted to stabilize the 
number of pages of the journal. The editor observed the following four variables: 

• Ti - the random variable of the logarithm of the number of advertising campaign for the 
journal; 

• T2 - the random variable of the logarithm of the number of submissions to the journal; 

• X - the random variable of the logarithm of the acceptance rate of the journal; 

• Y - the random variable of the logarithm of the number of pages of the journal. 

The editor can control the borderline whether or not to accept a manuscript graded by some 
referees. However, the acceptance rate is random variable because the grades of the manuscripts 
submitted to the journal are determined by the reviewers. Furthermore, the advertising campaign 
is not the editor's job and the editor can not control. To these variables, the editor constructed a 
simplified structural equation model which is represented as the path diagram in Figure [5] and the 
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following equations: 



^1 = /^ti;pa(ti) + fti;pa(ti)) 

^2 = fJ-t2;pa{t2) + "t2ti^l + %;pa(t2)) 

-''^ = /^a;;pa(x) + Oa-tiT'l + aa;t2?2 + ea,.pa(x), (35) 

~ /^y;pa(j/) + Ctyt2-^2 + Oyj;:^ + iy-psi,(y)i 



where 



_ log 10 \ / Average number of advertising campaign is 10, 
/^t;pa(t) \\og 100 J y and that of submissions is 100 where the effect of the parent is removed 

• ^i';pa(z) = log ^ (Average of acceptance rate is ^ when the effect of T are removed), 

• /iy;pa(y) = log 10, (Average number of pages for each manuscript is 10), 

• eti;pa(ii), et2;pa(t2)' ex;pa(x) and ej^.pa(j/) ~ N ^0, (^-^^ ^, 

• «t2ti = Jo and ayt2 = ayx = 1. 

The last equation in ()35p means that the number of pages of the journal is approximately equal to 
{Average number of pages for each manuscript} x {Number of submissions} x {Acceptance rate}. 
At this time, the path coefficients from T to X were axti = axt2 — and so the editor considered 
to intervene these two coefficients axti and axt2 to minimize the variance of the number of the 
pages. From Section 13.11 the problem of minimization of the variance can be represented as the 
following quadratic programming: 

Minimize (a.,, a.,,) U • (j ) \ + |^ . ioi)| f^.t.\ 

subject to a.xt^ ^ 

where the constraints for Uxt^ and Q^tj were determined by the editor's inspiration to avoid too 
strong dependency between T and X. By computing the above quadratic programming, the editor 
obtained the optimal solution a.xt = (—0.08, —0.20) and the variance of Y reduced to 0.264 from 
0.301. However, the editor noticed that the expectation of the number of pages of the journal 
under the optimal solution cxxt = (—0.08, —0.20) is 119.4322 and thought that it might be too 
small. Next, the editor designated the appropriate amount for the expectation of the number of 
pages of the journal as 200 and considered to achieve it by intervention to /ia;;pa(x)- From Section [Sj 
this problem can be formulated as the following quadratic programming: 



Mmimize fi^^ps-ix) ■ 1 ■ /ix;pa(x) + 



:;pa(x) 



2 ■ { ((l^ 1) + {\_ J)) (iJio°o) + 10 - l°g200 



subject to fJ-x;p^{x) < log 



where the constraint for jlx-pa{x) prevents the acceptance rate from exceeding 0.5. By computing the 
above quadratic programming, the editor obtained the optimal solution fJ'x;pa.{x) — 

-0.6931472 = 

log^. Then, the expectation of the number of the pages under the optimal solutions axt = 
(-0.08, -0.20) and Ax;pa(x) = -0.6931472 = log ^ is the 199.0536. 

As a result, the editor succeeded in minimizing the variance of the number of pages of the 
journal and adjusting the expectation to the appropriate amount. 
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What should the editor do, if the editor wants to change the expectation of the number of pages 
with the minimized variance? In this case, all the editor has to do is to re-intervene to Hx;pa.{x) ■ 
The interventions to the path coefficients axti and cext2 are not needed because the intervention to 
Hx-pnix) changes the expectation without changing the minimized variance, (though, if the constraint 
for Hx;pa,{x) is too strong, then the interventions to Oixti and axt2 might be needed to adjust the 
expectation) . This is the reason why we separate the problem into two algorithms as in Section [3l 
Furtherr nore, note that this two-step procedure has been used in the area of statistical quality 
control. iTaguchi recommended the two-step optimization to solve the design optimization 

problem, in which we first maximize the S/N ratio and adjust the expectation on target in the next 
step. 



5 Conclusion 

We have introduced matrix representation of total effects and some ideas of their decomposition. 
Then, we have shown that problems to obtain the optimal intervention that minimizes the variances 
and to adjust the expectations can be formulated as convex quadratic programmings. 

In Theorem [31 we assume that Cov[T , ex-pa.{x)] = Cov[T, es.pa(s)] = Cov[X, eg.pa(s)] = O. How- 
ever, this assumption does not hold if there are latent variables that affect both T and X , or both 
T and S, or both X and S. In future work, we intend to extend our results to the case where the 
assumption of Theorem [3] does not hold. 

Throughout this paper, we treat only the case that the structural equation model which repre- 
sents the true relationships between real objects is given in advance. Is the method introduced in 
this paper not useful if we do not have the true model? We think the answer is yes. If the given 
model is not true, then the intervention effect computed by using the method in this paper and the 
intervention effect observed in real mostly have different values. Therefore, the intervention and 
the computation of the intervention effect based on the given model can be used for verification 
whether the model is true or not. We also intend to consider this subject in future work. 



A Appendix 



A.l Kronecker product and Vec Operator 

Let B = {bij} = [bi . . . 6„] be an m x n matrix and C he a p x q matrix. 
The mp x nq matrix 

/buC buC ■■■ binC\ 
def ^2lC' 622C' • • • b2nC 



\bmlC bm2C 

is called the Kronecker product of B and C. 

The vec operator for a matrix is defined as follows. 



vec{B) 



def. 



bmnC J 



Let D he an n X p matrix. The following relation holds. 

vec{BDC) = (C^ ® B)vec{D) 



(36) 
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